Efficient Reduction for Wait-Free Termination Detection in a Crash-Prone Distributed System

نویسندگان

  • Neeraj Mittal
  • Felix C. Freiling
  • Subbarayan Venkatesan
  • Lucia Draque Penso
چکیده

We investigate the problem of detecting termination of a distributed computation in systems where processes can fail by crashing. Specifically, when the communication topology is fully connected, we describe a way to transform any termination detection algorithm A that has been designed for a failure-free environment into a termination detection algorithm B that can tolerate process crashes. Our transformation assumes the existence of a perfect failure detector. We show that a perfect failure detector is in fact necessary to solve the termination detection problem in a crash-prone distributed system even if at most one process can crash. Let μ(n,M) and δ(n,M) denote the message complexity and detection latency, respectively, of A when the system has n processes and the underlying computation exchanges M application messages. The message complexity of B is at most O(n + μ(n, 0)) messages per failure more than the message complexity of A. Also, its detection latency is at most O(δ(n, 0)) per failure more than that of A. Furthermore, the overhead (that is, the amount of control data piggybacked) on an application message increases by only O(log n) bits per failure. The fault-tolerant termination detection algorithm resulting from the transformation satisfies two desirable properties. First, it can tolerate failure of up to n−1 processes, that is, it is wait-free. Second, it does not impose any overhead on the fault-sensitive termination detection algorithm until one or more processes crash, that is, it is fault-reactive. Our transformation can be extended to arbitrary communication topologies provided process crashes do not partition the system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Reductions for Wait-Free Termination Detection in Faulty Distributed Systems

We investigate the problem of detecting termination of a distributed computation in asynchronous systems where processes can fail by crashing. More specifically, for both fully and arbitrarily connected communication topologies, we describe efficient ways to transform any fault-sensitive termination detection algorithm A, that has been designed for a failure-free environment , into a wait-free ...

متن کامل

On termination detection in crash-prone distributed systems with failure detectors

We investigate the problem of detecting termination of a distributed computation in systems where processes can fail by crashing. Specifically, when the communication topology is fully connected, we describe a way to transform any termination detection algorithm A that has been designed for a failure-free environment into a termination detection algorithm B that can tolerate process crashes. Ou...

متن کامل

Digital Fountains and Their Application to Informed Content Delivery over Adaptive Overlay Networks

Securing the net : challenges, failures and directions p. 2 Coeterie availability in sites p. 3 Keeping denial-of-service attackers in the dark p. 18 On conspiracies and hyperfairness in distributed computing p. 33 On the availability of non-strict quorum systems p. 48 Musical benches p. 63 Obstruction-free algorithms can be practically wait-free p. 78 Efficient reduction for wait-free terminat...

متن کامل

Survey of Distributed Decision

We survey the recent distributed computing literature on checking whether a given distributed system configuration satisfies a given boolean predicate, i.e., whether the configuration is legal or illegal w.r.t. that predicate. We consider classical distributed computing environments, including mostly synchronous fault-free network computing (LOCAL and CONGEST models), but also asynchronous cras...

متن کامل

Termination Detection in Systems Where Processes May Crash and Recover —

An algorithm solving the termination detection problem observes a computation of a distributed system and announces “termination” if the computation has come to an end. This work addresses termination detection in systems where processes fail by crashing and may restart later on. The new definition of robust-restricted termination sensible in the crash-recovery model is developed. A computation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005